Toward a Unified Retrieval Outcome Analysis Framework for Cross- Language Information Retrieval

نویسنده

  • Jiangping Chen
چکیده

This paper proposes a Retrieval Outcome Analysis Framework, or ROA Framework, to systematically evaluate retrieval performance of Cross-Language Information Retrieval systems. The ROA framework goes beyond TREC-type retrieval evaluation methodology by including procedures focusing on individual queries, especially difficult queries. The framework is comprised of four interrelated components: (1) Overall System Performance Evaluation, (2) Query Categorization, (3) Translation Analysis, and (4) Individual Query Analysis. An example of applying the framework is discussed in detail. The author believes the proposed framework would be especially useful for the development of real-world CrossLanguage Information Retrieval systems because the evaluation guided by the framework has the potential to discover causes behind poor retrieval performance. Introduction Cross-Language Information Retrieval (CLIR) is a special case of Information Retrieval (IR). It explores solutions to finding relevant documents in a collection of documents written in a different language or languages from users’ queries. A CLIR system often behaves quite differently in response to different queries: The system retrieves relevant documents or web pages as top-ranked ones for some queries, but it fails to find any relevant documents, or ranks them very low, for some other queries. In the latter case, the users either cannot obtain the needed information, or they have to study the long list of returned documents to locate what they want. CLIR evaluation is an essential part of CLIR system design and development. A well-designed evaluation guided by sound methodology should be able to identify the strengths and the weaknesses of the system, especially the causes of unsatisfactory retrieval performance in response to certain queries, and to provide evidence for system improvement. However, current CLIR evaluation focuses more on the average performance over multiple topics than individual topic, just like monolingual IR system evaluation, as Hu, Bandhakavi, and Zhai have pointed out (2003). Few systems or researchers have performed systematic, in-depth analysis on individual queries or topics. In particular, researchers have paid little attention to those difficult queries or topics for which relevant documents or answers are not found or are ranked very low by IR systems or CLIR systems. Consequently, little is known about why some queries are more difficult then others. Current IR evaluation as conducted by TREC (http://trec.nist.gov/) may help the system to improve overall performance, but produces a limited effect on certain difficult queries because current TREC evaluations lack methods for performing in-depth retrieval analysis. The researcher believes that it is necessary to explore methodological issues of conducting analysis at individual query level in order to understand the causes behind IR system performance. The investigation would benefit IR systems, especially real-world information access and retrieval systems, by allowing system designers to adjust their retrieval and user interaction strategies to provide better service for their users. In this paper, the author introduces a concept called Retrieval Outcome Analysis (ROA). ROA refers to a series of analytical procedures which systematically evaluate information retrieval on individual queries. In contrast to the traditional, TREC-like IR system evaluation paradigm, ROA focuses on exploring the causes behind retrieval performance on individual queries. A well designed ROA should provide more evidence to explain why a system performs well on certain topics and why it does poorly on some others, not just precision and recall scores. In order to demonstrate the usefulness of the ROA and the procedures involved in it, the author proposes an ROA framework as a methodology for CLIR system evaluation. The ROA framework that is built upon the ROA concept will be presented and illustrated in the remaining part of this paper: The next section, “Related Research,” reviews current IR system evaluation strategies and studies that have contributed to IR or CLIR performance analysis methodologies. The following section presents the ROA framework for CLIR. The fourth section provides

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic annotation for concept-based cross-language medical information retrieval

We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech ...

متن کامل

Cross-Lingual Medical Information Retrieval through Semantic Annotation

We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes ...

متن کامل

Image Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix

In this article, a fabulous method for database retrieval is proposed.  The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...

متن کامل

Matching Meaning for Cross-Language Information Retrieval

This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using ...

متن کامل

Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005